Cross-View Feature Learning for Scalable Social Image Analysis

نویسندگان

  • Wenxuan Xie
  • Yuxin Peng
  • Jianguo Xiao
چکیده

Nowadays images on social networking websites (e.g., Flickr) are mostly accompanied with user-contributed tags, which help cast a new light on the conventional content-based image analysis tasks such as image classification and retrieval. In order to establish a scalable social image analysis system, two issues need to be considered: 1) Supervised learning is a futile task in modeling the enormous number of concepts in the world, whereas unsupervised approaches overcome this hurdle; 2) Algorithms are required to be both spatially and temporally efficient to handle large-scale datasets. In this paper, we propose a cross-view feature learning (CVFL) framework to handle the problem of social image analysis effectively and efficiently. Through explicitly modeling the relevance between image content and tags (which is empirically shown to be visually and semantically meaningful), CVFL yields more promising results than existing methods in the experiments. More importantly, being general and descriptive, CVFL and its variants can be readily applied to other large-scale multi-view tasks in unsupervised setting. Introduction Over the past years, content-based image analysis tasks such as image classification and retrieval (Smeulders et al. 2000; Torres et al. 2009) have always been plagued with the gap between low-level representation and high-level semantics, i.e., the semantic gap. How to construct visual representations that are able to properly reflect the underlying semantic meaning remains to be an open problem. However, fortunately, whilst it may be difficult to bridge the semantic gap by diving solely into the image content, the advent of social networking websites (e.g., Flickr) brings new opportunities to the content-based image analysis problem by providing user-contributed tags for social images. Although being inaccurate and incomplete sometimes, tags are easily available and beneficial for social image analysis. To state conveniently, a social image in this paper consists of three components: image, tag and label. Image refers to the visual content, tag refers to the associated usercontributed tags, and label refers to the semantic concepts. ∗Corresponding author. Copyright c © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Through utilizing the above three components, many supervised and semi-supervised approaches to social image analysis have been proposed, e.g., multiple kernel learning (Lanckriet et al. 2004; Wang et al. 2010), feature selection (Guyon and Elisseeff 2003; Xu et al. 2010) and distance metric learning (Yang and Jin 2006; Bilenko, Basu, and Mooney 2004). However, despite the effectiveness of the aforementioned approaches, we still need to consider two important issues in order to establish a scalable social image analysis system: Numerous concepts. Large datasets are accompanied with many more concepts. For instance, ImageNet currently counts approximately 22K concepts (Deng et al. 2009), which makes it a futile task to model so many, and often visually similar, concepts. What is more, the number of concepts in the world can be far more than 22K. Instead of modeling so many concepts, unsupervised approaches1 overcome this hurdle. As a consequence, we focus on unsupervised approaches in this paper, i.e., only image content and tags can be utilized. Scalability. In the presence of enormous number of social images on the web, algorithms that lack spatial or temporal efficiency are prohibited. Therefore, in order to handle large datasets, we focus on algorithms whose time and space complexities are both limited to at most O(N) with respect to data size N in this paper. Under the constraints of the above two issues, algorithms which are able to handle the problem of scalable social image analysis should be efficiently defined over feature representations. For example, feature combination (Atrey et al. 2010) is a straightforward method. Despite its simplicity, feature combination has been demonstrated empirically to be effective, which may be due to the fact that image content and tags are two different views and are complementary. Based on the combined feature representation, principal component analysis (PCA) (Hotelling 1933) extracts compressed representation by maximizing the variance in the principal subspace. What is more, instead of simply combining image and tags, algorithms such as canonical correlation analysis (CCA) (Hotelling 1936) have been proposed Image-tag pair is also a kind of supervision information. However, in this paper, we only refer supervision information to the label of a social image. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Image Annotation by Summarizing Training Samples into Labeled Prototypes

By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...

متن کامل

DPML-Risk: An Efficient Algorithm for Image Registration

Targets and objects registration and tracking in a sequence of images play an important role in various areas. One of the methods in image registration is feature-based algorithm which is accomplished in two steps. The first step includes finding features of sensed and reference images. In this step, a scale space is used to reduce the sensitivity of detected features to the scale changes. Afterw...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets

Cross-view retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Crossview hashing is to efficiently solve the cross-view retrieval problem with binary hash codes. Most existing works on cross-view hashing exploit multiview embedding method to tackle this problem, which inevitably causes the information loss in both i...

متن کامل

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014